Percentile Finding Algorithm for Multiple Sorted Runs

نویسندگان

  • Balakrishna R. Iyer
  • Gary R. Ricard
  • Peter J. Varman
چکیده

External sorting is frequently used b>relational database s!-stems for building indexes on tables, ordered retrieval, duplicate elimination, joins, subqueries. grouping, and aggregation; it would be quite beneficial to parallelize this function. Previous parallel external sorting algorithms found in the database literature used a sequential merge as the final stage of the parallel sort. This reduces the speedup gained through parallelism in earlier stages of sort. The solution is to merge in parallel as well. Load balanced parallel two way merges and approximately load balanced parallel multi way merges are known. Measurements reported on parallel sorting that employs one of the approximate partitioning methods indicate that even if the sort keys are randomly distributed the load imbalance due to the approximation degrades speedup due to parallelism. Sort key value skews, known to occur in database workloads, can only exacerbate this problem. We give, prove and analyze an efficient exact method which can find any percentile of an arbitrary number of sorted runs. Application of our algorithm ensures load balance during the parallel merge. By removing the effect of skews of sort key values which caused loss of speed up in previous approaches our method can improve the spcedup for parallel sorting on multiple processors. While we target our work to a parallel computer architecture of shared memory MIMD parallel processors, our results are also likely to be useful for other parallel computer architectures. 1 fntroduction The need for database MIPS per database installation is outstripping the uniprocessor MIPS supplied by computer vendors. External sorting is frequently invoked by relational database systems for building indexes on tables, ordered retrieval, duplicate elimination, joins, sub-queries, grouping, and aggregation and is known to be a time consuming operation. External sorting on multiple processors is, therefore, an important and beneficial problem to be solved for relational database svstems. For purposes of exposition and analysis we w’fl assume a shared memory shared data computer architecture. Yet, the reader may find much of our work equally applicable to loosely coupled architectures. Pew&&n to copy without fee all or part of this material is granted provided that the copies are not made OT distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, OT to republish, requires a fee and/or special permission from the Endowment. Proceedings of the Fifteenth International Conference on Very Large Data Bases 55901 Houston, TX 7725i1892 1.1 Background We start by examining some of the parallel sort algorithms in the database literature. Perhaps the simplest technique is to partition the sort kelrange prior to accessing the database. Selected rows from the database are assigned to different buckets, each bucket corresponding to a key range. Rows in different buckets are sorted in parallel and then the result concatenated or used for further processing. This simple technique is not practical for the following two reasons: (a) tight upper and lower bounds for the sort key range are not easily ‘determined before the rows from a database are selected, and (b) it is dficult to partition the sort key range into partitions so that about the same number of rows will have sort k.eys belonging to any one partition, load balancing 1s compromised. Bitton et. al. (BIT83j propose two external parallel sort algorithms for use in database systems that they call parallel binary merge and block bitonic sort. Both algorithms employ the binary two way merge of sorted runs. In a typical external sort for database systems, sorted runs are first created as temporary relations .on disk and then merged. If we use algorithms that rely on a two way merge the number of IiO’s. temporary relation inserts and fetches per sorted row will be of the order of lo&(A-R) where A’R is the number of initial sorted runs created. Both I/O’s and number of temporary relation interactions are expensive and they make algorithms based on a two way merge expensive, suggesting that the two algorithms are primarily useful for internal sort where there is no I/O or temporary relation interaction. Even for parallel internal sorting, Bitton’s parallel binary merge algorithm suffers from the drawback that its last phase is essentially a sequential binary merge and it is therefore impossible to finish the sort earlier than the time for one sequential pass through the entire data regardless of the number of processors. While it may not be possible to avoid examining all the data for sorting, we must avoid examining it all serialIp if we are to obtain speedup linear in the number of processors (over a reasonable range of values for the number of processors). Valduriez and Gardarin (VAL84) generalized the parallel binary merge algorithm to a k-way merge algorithm. In their algorithm p proLcssors will merge different sets of runs in parallel using a k-way merge. We then have p runs. These are merged (assuming p < k) sequentially on a single processor. A pipelined version of this algorithm has been proto-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

External Sorting on Flash Memory Via Natural Page Run Generation

The increasing popularity of flash memory means more database systems will run on flash memory in the future. One of the most important database operations is the external sort. Hence, this paper is focused on studying the problem of efficient external sorting on flash memory. In contrast to most previous work, we target the situation where previously sorted data has become progressively un-sor...

متن کامل

Generation of Long Sorted Runs on the IBM SP2

In the first phase of sorting a large file, sorted sequences, called runs, are generated. In the second phase, the runs are merged into a sorted file. The merge time can be much greater than the runs-generation time. Generating longer runs and thus fewer runs in the first phase may greatly reduce the merge time. In this paper, we present a parallel algorithm that can utilize the broadcast capab...

متن کامل

Finding the Convex Hull of a Sorted Point Set in Parallel

We present a parallel algorithm for finding the convex hull of a sorted planar point set. Our algorithm runs in O(logn) time using O(n/logn) processors in the CREW PRAM computational model, which is optimal. One of the techniques we use to achieve these optimal bounds is the use a parallel data structure which we call the hull tree.

متن کامل

Run Generation Revisited: What Goes Up May or May Not Come Down

We revisit the classic problem of run generation. Run generation is the first phase of external-memory sorting, where the objective is to scan through the data, reorder elements using a small buffer of size M , and output runs (contiguously sorted chunks of elements) that are as long as possible. We develop algorithms for minimizing the total number of runs (or equivalently, maximizing the aver...

متن کامل

A Linear Algorithm for the Connected Domination Problem on Circular-Arc Graphs

A connected domination set of a graph is a set D of vertices such that every vertex not in D is adjacent to at least one vertex in D, and the induced subgraph of D is connected. Given a circulararc graph G in arc model with n sorted arcs, we present an algorithm for finding a minimum connected domination set of G. Our algorithm runs in O(n) time and space.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1989